Search CORE

1,007 research outputs found

A Study of Snippet Length and Informativeness: Behaviour, Performance and User Experience

Author: Cutrell E.
Feild H.
He J.
Kaisser M.
Kelly D.
Pedersen J.
Schwartz B.
Voorhees E. M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/08/2017
Field of study

The design and presentation of a Search Engine Results Page (SERP) has been subject to much research. With many contemporary aspects of the SERP now under scrutiny, work still remains in investigating more traditional SERP components, such as the result summary. Prior studies have examined a variety of different aspects of result summaries, but in this paper we investigate the influence of result summary length on search behaviour, performance and user experience. To this end, we designed and conducted a within-subjects experiment using the TREC AQUAINT news collection with 53 participants. Using Kullback-Leibler distance as a measure of information gain, we examined result summaries of different lengths and selected four conditions where the change in information gain was the greatest: (i) title only; (ii) title plus one snippet; (iii) title plus two snippets; and (iv) title plus four snippets. Findings show that participants broadly preferred longer result summaries, as they were perceived to be more informative. However, their performance in terms of correctly identifying relevant documents was similar across all four conditions. Furthermore, while the participants felt that longer summaries were more informative, empirical observations suggest otherwise; while participants were more likely to click on relevant items given longer summaries, they also were more likely to click on non-relevant items. This shows that longer is not necessarily better, though participants perceived that to be the case - and second, they reveal a positive relationship between the length and informativeness of summaries and their attractiveness (i.e. clickthrough rates). These findings show that there are tensions between perception and performance when designing result summaries that need to be taken into account

Crossref

University of Strathclyde Institutional Repository

Enlighten

Transfer Learning for Multi-language Twitter Election Classification

Author: Chandar S.
Ebert S.
Fang A.
O'Connor B.
Ratkiewicz J.
Severyn A.
Voorhees E. M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Both politicians and citizens are increasingly embracing social media as a means to disseminate information and comment on various topics, particularly during significant political events, such as elections. Such commentary during elections is also of interest to social scientists and pollsters. To facilitate the study of social media during elections, there is a need to automatically identify posts that are topically related to those elections. However, current studies have focused on elections within English-speaking regions, and hence the resultant election content classifiers are only applicable for elections in countries where the predominant language is English. On the other hand, as social media is becoming more prevalent worldwide, there is an increasing need for election classifiers that can be generalised across different languages, without building a training dataset for each election. In this paper, based upon transfer learning, we study the development of effective and reusable election classifiers for use on social media across multiple languages. We combine transfer learning with different classifiers such as Support Vector Machines (SVM) and state-of-the-art Convolutional Neural Networks (CNN), which make use of word embedding representations for each social media post. We generalise the learned classifier models for cross-language classification by using a linear translation approach to map the word embedding vectors from one language into another. Experiments conducted over two election datasets in different languages show that without using any training data from the target language, linear translations outperform a classical transfer learning approach, namely Transfer Component Analysis (TCA), by 80% in recall and 25% in F1 measure

Crossref

Enlighten

Recommended from our members

Long-term safety of siltuximab in patients with idiopathic multicentric Castleman disease: a prespecified, open-label, extension analysis of two trials.

Author: Casper Corey
Fayad Luis E
Gibson Damilola
Kanhai Karan
Kurzrock Razelle
van Rhee Frits
Voorhees Peter M
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

BACKGROUND:Siltuximab is recommended by international consensus as a first-line treatment for idiopathic multicentric Castleman disease on the basis of durable efficacy and safety data. This study was done to assess the long-term safety and activity of siltuximab over up to 6 years of treatment. METHODS:This study is a prespecified open-label extension analysis of a phase 1 trial (NCT00412321) and a phase 2 trial (NCT01024036), done at 26 hospitals worldwide. Patients in both studies were at least 18 years old with histologically confirmed, symptomatic Castleman disease. This extension study enrolled 60 patients who completed the previous trials without disease progression on siltuximab. Patients received siltuximab infusions of 11 mg/kg every 3 weeks (which could be extended to 6 weeks) for up to 6 years. Descriptive statistics were used to summarise the data. No formal hypothesis testing was performed. The primary endpoint was the safety of siltuximab, assessed at each dosing cycle. The study was registered with ClinicalTrials.gov, number NCT01400503 and with EudraCT, number 2010-022837-27. FINDINGS:Patient enrolment into the phase 1 trial was from June 20, 2005, to Sept 15, 2009, and enrolment into the phase 2 trial was from Feb 9, 2010, to Feb 3, 2012. Patients were enrolled in this long-term extension from April 1, 2011, to Jan 15, 2014. Median follow-up was 6 years (IQR 5·11-7·76). Median treatment duration, from the beginning of the previous trials to the end of the present study, was 5·5 years (IQR 4·26-7·14). Siltuximab was well tolerated; however, adverse events of grade 3 or worse were reported in 36 (60%) of 60 patients with the most common being hypertension (eight [13%]), fatigue (five [8%]), nausea (four [7%]), neutropenia (four [7%]), and vomiting (three [5%]). 25 (42%) patients reported at least one serious adverse event, which most commonly was an infection (eight [13%]). Only two serious adverse events, polycythaemia and urinary retention, were considered related to siltuximab treatment. 18 patients discontinued before study completion, either to receive siltuximab locally (eight) or because of progressive disease (two), adverse events (two), or other reasons (six). No deaths were reported. INTERPRETATION:These results show that siltuximab is well tolerated long term and provides important evidence for the feasibility of the life-long use required by patients with idiopathic multicentric Castleman disease. FUNDING:Janssen R&D and EUSA Pharma

eScholarship - University of California

An integrated approach to rotorcraft human factors research

Author: Bucher Nancy M.
Hart Sandra G.
Hartzell E. James
Shively R. Jay
Voorhees James W.
Publication venue
Publication date
Field of study

As the potential of civil and military helicopters has increased, more complex and demanding missions in increasingly hostile environments have been required. Users, designers, and manufacturers have an urgent need for information about human behavior and function to create systems that take advantage of human capabilities, without overloading them. Because there is a large gap between what is known about human behavior and the information needed to predict pilot workload and performance in the complex missions projected for pilots of advanced helicopters, Army and NASA scientists are actively engaged in Human Factors Research at Ames. The research ranges from laboratory experiments to computational modeling, simulation evaluation, and inflight testing. Information obtained in highly controlled but simpler environments generates predictions which can be tested in more realistic situations. These results are used, in turn, to refine theoretical models, provide the focus for subsequent research, and ensure operational relevance, while maintaining predictive advantages. The advantages and disadvantages of each type of research are described along with examples of experimental results

NASA Technical Reports Server

An analysis of query difficulty for information retrieval in the medical domain

Author: Leveling J.
Pradhan S.
Sakai T.
Voorhees E. M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 06/07/2014
Field of study

We present a post-hoc analysis of a benchmarking activity for information retrieval (IR) in the medical domain to determine if performance for queries with different levels of complexity can be associated with different IR methods or techniques. Our analysis is based on data and runs for Task 3 of the CLEF 2013 eHealth lab, which provided patient queries and a large medical document collection for patient centred medical information retrieval technique development. We categorise the queries based on their complexity, which is defined as the number of medical concepts they contain. We then show how query complexity affects performance of runs submitted to the lab, and provide suggestions for improving retrieval quality for this complex retrieval task and similar IR evaluation tasks

Crossref

Hal - Université Grenoble Alpes

DCU Online Research Access Service

Unbiased Comparative Evaluation of Ranking Functions

Author: Owen A. B.
Pavlu V.
Peng Ye D. D.
Sparck-Jones K.
Voorhees E. M.
Yuan C.
Zhao P.
Publication venue
Publication date: 25/04/2016
Field of study

Eliciting relevance judgments for ranking evaluation is labor-intensive and costly, motivating careful selection of which documents to judge. Unlike traditional approaches that make this selection deterministically, probabilistic sampling has shown intriguing promise since it enables the design of estimators that are provably unbiased even when reusing data with missing judgments. In this paper, we first unify and extend these sampling approaches by viewing the evaluation problem as a Monte Carlo estimation task that applies to a large number of common IR metrics. Drawing on the theoretical clarity that this view offers, we tackle three practical evaluation scenarios: comparing two systems, comparing

k

systems against a baseline, and ranking

k

systems. For each scenario, we derive an estimator and a variance-optimizing sampling distribution while retaining the strengths of sampling-based evaluation, including unbiasedness, reusability despite missing data, and ease of use in practice. In addition to the theoretical contribution, we empirically evaluate our methods against previously used sampling heuristics and find that they generally cut the number of required relevance judgments at least in half.Comment: Under review; 10 page

arXiv.org e-Print Archive

Crossref

Evaluating Variable-Length Multiple-Option Lists in Chatbots and Mobile Search

Author: Bocklisch T.
Braun D.
Burtsev M.
Peñas A.
Russell-Rose T.
Sakai T.
Voorhees E. M.
Williams J. D.
Xu P.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

In recent years, the proliferation of smart mobile devices has lead to the gradual integration of search functionality within mobile platforms. This has created an incentive to move away from the "ten blue links'' metaphor, as mobile users are less likely to click on them, expecting to get the answer directly from the snippets. In turn, this has revived the interest in Question Answering. Then, along came chatbots, conversational systems, and messaging platforms, where the user needs could be better served with the system asking follow-up questions in order to better understand the user's intent. While typically a user would expect a single response at any utterance, a system could also return multiple options for the user to select from, based on different system understandings of the user's intent. However, this possibility should not be overused, as this practice could confuse and/or annoy the user. How to produce good variable-length lists, given the conflicting objectives of staying short while maximizing the likelihood of having a correct answer included in the list, is an underexplored problem. It is also unclear how to evaluate a system that tries to do that. Here we aim to bridge this gap. In particular, we define some necessary and some optional properties that an evaluation measure fit for this purpose should have. We further show that existing evaluation measures from the IR tradition are not entirely suitable for this setup, and we propose novel evaluation measures that address it satisfactorily.Comment: 4 pages, in Proceeding of SIGIR 201

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

Modelling Relevance towards Multiple Inclusion Criteria when Ranking Patients

Author: Amati G.
Demner-Fushman D.
Edinger T.
Hersh W.
Hiemstra D.
King B.
Limsopatham N.
Limsopatham N.
Ounis I.
Qi Y.
Robertson S. E.
Voorhees E. M.
Voorhees E. M.
Zhu D.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

In the medical domain, information retrieval systems can be used for identifying cohorts (i.e. patients) required for clinical studies. However, a challenge faced by such search systems is to retrieve the cohorts whose medical histories cover the inclusion criteria specified in a query, which are often complex and include multiple medical conditions. For example, a query may aim to find patients with both 'lupus nephritis' and 'thrombotic thrombocytopenic purpura'. In a typical best-match retrieval setting, any patient exhibiting all of the inclusion criteria should naturally be ranked higher than a patient that only exhibits a subset, or none, of the criteria. In this work, we extend the two main existing models for ranking patients to take into account the coverage of the inclusion criteria by adapting techniques from recent research into coverage-based diversification. We propose a novel approach for modelling the coverage of the query inclusion criteria within the records of a particular patient, and thereby rank highly those patients whose medical records are likely to cover all of the specified criteria. In particular, our proposed approach estimates the relevance of a patient, based on the mixture of the probability that the patient is retrieved by a patient ranking model for a given query, and the likelihood that the patient's records cover the query criteria. The latter is measured using the relevance towards each of the criteria stated in the query, represented in the form of sub-queries. We thoroughly evaluate our proposed approach using the test collection provided by the TREC 2011 and 2012 Medical Records track. Our results show significant improvements over existing strong baselines

Crossref

Enlighten

Recommended from our members

Creative professional users musical relevance criteria

Author: A. Gruzd
Andy MacFarlane
C. Inskip
C. Inskip
Charlie Inskip
D. Bawden
E. Law
E. Rasmussen
E. Sormunen
E. Voorhees
J. Kim
J.S. Downie
J.S. Downie
L. Barrington
L. Schamber
M. Mandel
Mirex
Pauline Rafferty
S. Rüger
Spotify
T.D. Anderson
Trec
Publication venue: 'SAGE Publications'
Publication date: 28/06/2010
Field of study

Although known item searching for music can be dealt with by searching metadata using existing text search techniques, human subjectivity and variability within the music itself make it very difficult to search for unknown items. This paper examines these problems within the context of text retrieval and music information retrieval. The focus is on ascertaining a relationship between music relevance criteria and those relating to relevance judgements in text retrieval. A data-rich collection of relevance judgements by creative professionals searching for unknown musical items to accompany moving images using real world queries is analysed. The participants in our observations are found to take a socio-cognitive approach and use a range of content and context based criteria. These criteria correlate strongly with those arising from previous text retrieval studies despite the many differences between music and text in their actual content

City Research Online

Crossref